This article analysis pedestrian walking trend throughout the last 9 years -2013/2021-, using open data published by the City of Sydney Council.
For this analysis, we’ll be using the following libraries:
library(ggplot2)
library(knitr)
library(leaflet)
library(sf)
library(tidyverse)
library(lubridate)
library(plotly)
library(gganimate) #GIF maker
#For any missing library, remember to <install.packages('PACKAGE-NAME')>
options(scipen=999) #turn off scientific notation
Data available at:
df <- read.csv('data/Walking_count_surveys.csv', stringsAsFactors = TRUE, encoding = "UTF-8")
sites <- st_read('data/Walking_count_sites.shp') %>%
rename(SiteID=Site_ID) %>%
st_difference()
## Reading layer `Walking_count_sites' from data source
## `D:\Escritorio\Proyectos\Sydney_Walking-trend\data\Walking_count_sites.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 120 features and 4 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 151.1779 ymin: -33.90846 xmax: 151.2323 ymax: -33.8544
## Geodetic CRS: WGS 84
Let’s have a look at the survey dataset…
head(df)
## X.U.FEFF.OBJECTID SiteID Location Description
## 1 6760 2 Botany Road Between Bourke Street and Hansard Street
## 2 6761 2 Botany Road Between Bourke Street and Hansard Street
## 3 6762 2 Botany Road Between Bourke Street and Hansard Street
## 4 6763 2 Botany Road Between Bourke Street and Hansard Street
## 5 6764 2 Botany Road Between Bourke Street and Hansard Street
## 6 6765 2 Botany Road Between Bourke Street and Hansard Street
## Period Year Month TotalCount DateFilter Time_0600 Time_0700
## 1 weekday 2013 Spring 3504 2013 - Weekday - October 186 216
## 2 weekend 2013 Spring 834 2013 - Weekend - October 24 78
## 3 weekday 2014 Autumn 3300 2014 - Weekday - May 168 300
## 4 weekday 2014 Spring 2844 2014 - Weekday - October 78 204
## 5 weekend 2014 Autumn 1134 2014 - Weekend - May 54 78
## 6 weekend 2014 Spring 780 2014 - Weekend - October 24 30
## Time_0800 Time_0900 Time_1000 Time_1100 Time_1200 Time_1300 Time_1400
## 1 582 258 126 42 30 36 78
## 2 72 36 84 66 42 54 30
## 3 504 336 132 42 36 48 150
## 4 204 114 90 84 300 66 126
## 5 108 72 78 72 66 132 114
## 6 78 72 48 72 66 84 12
## Time_1500 Time_1600 Time_1700 Time_1800 Time_1900 Time_2000 Time_2100
## 1 96 258 462 372 294 234 174
## 2 24 54 90 60 18 42 24
## 3 156 306 522 198 138 132 84
## 4 264 282 540 162 162 54 54
## 5 126 78 24 48 30 12 12
## 6 66 42 42 18 54 36 0
## Time_2200 Time_2300
## 1 36 24
## 2 30 6
## 3 24 24
## 4 42 18
## 5 18 12
## 6 12 24
Brief insight:
Each site contains the following information:
- ID & location details
- Period (weekeday / weekend)
- Season of the year (fall / sping)
- Total pedestrian counted per day
- Total pedestrian counted per hour (6 am to 11 pm)
paste('The dataset contains information of ', length(unique(df$SiteID)), ' sites and ', length(unique(df$Location)), ' locations (clustered sites)')
## [1] "The dataset contains information of 122 sites and 62 locations (clustered sites)"
Now, a quick inspection of the geographical data we’ll be we working with:
kable(head(sites))
| OBJECTID | SiteID | Location | SiteDescri | geometry |
|---|---|---|---|---|
| 1 | 2 | Botany Road | Between Bourke Street and Hansard Street | POINT (151.2029 -33.90732) |
| 2 | 3 | King Street | Between Whitehorse Street and Newman Street | POINT (151.1779 -33.89934) |
| 3 | 4 | William Street | Between Crown Street and Palmer Street | POINT (151.2167 -33.87438) |
| 4 | 5 | City Road | Between Carillon Avenue and Forbes Street | POINT (151.188 -33.8917) |
| 5 | 6 | Broadway | Between Buckland Street and Abercrombie Street | POINT (151.1977 -33.88434) |
| 6 | 7 | Railway Square | Between Quay Street and Harris Street | POINT (151.2031 -33.8837) |
It contains information of:
- ID & location details
- Geometry
sites <- sites %>%
select(SiteID, Location) %>% #IDs for joing data
mutate(long = st_coordinates(.)[,1],
lat = st_coordinates(.)[,2])
leaflet(sites) %>%
addTiles() %>%
addProviderTiles(providers$CartoDB) %>%
addCircles(lng = ~long,
lat = ~lat,
weight = 4,
popup = ~paste('<strong>',"Location:",'</strong>', Location))
Many Locations are clustered under the same name. We’ll create a new tag to identify each one of them separately, using the siteID information
sites <- sites %>%
mutate(Location_site=paste(SiteID,'-',Location))
df <- df %>%
select(-Location) %>% #repeated in both sets of data
left_join(sites, by='SiteID')
df <- dplyr::filter(df, !Year==2022) #Spring's not included yet!
days <- df %>%
group_by(Period, Month) %>%
summarise(Amount=n(),
Avg=round(mean(TotalCount),2))
plot_ly(data = days,
x = ~Month,
y = ~Avg,
color = ~Period,
type = "bar") %>%
layout(title = 'Pedestrian behaviour - Periods and days', xaxis = list(title = 'Period'),
yaxis = list(title = 'Average amount of pedestrians'))
In average, there are more pedestrians counted during Autumn rather than Spring.
Weekdays are busier, specially when it comes to Autumn.
There’s a higher decline in the amount of pedestrians walking on weekends during Autumn, representing a drop of 22% of the corresponding amount for the same season during weekdays.
During Spring people tend to walk more on weekends compared to Autumn. For this case, the reduction represents 14% of what is counted on weekdays.
weekdays <- df %>%
dplyr::filter(Period=='weekday') #leaving the weekends behind
weekdays_2 <- weekdays %>%
group_by(Location_site, Month) %>%
summarise(Amount=sum(TotalCount)) %>%
mutate(Perc=Amount/sum(weekdays$Amount)*100) %>%
left_join(sites, by='Location_site') %>%
arrange()
plot_ly(data=weekdays_2,
x = ~Location_site,
y = ~Amount,
color = ~Month,
type = "bar") %>%
layout(title = 'Pedestrian behaviour - Periods and days', xaxis = list(categoryorder = "total ascending"),
yaxis = list(title = 'Average amount of pedestrians'))
spring_over_autumn <- weekdays_2 %>%
pivot_wider(names_from = Month, values_from = Amount) %>%
mutate(DIF_spring=Spring/Autumn*100,
TIME=case_when(
DIF_spring > 100 ~ "Spring",
DIF_spring < 100 ~ "Autumn")) %>%
select(Location_site, DIF_spring, TIME)
qpal <- colorFactor(palette = c("#8da0cb", "#66c2a5"),
domain = c("Spring", "Autumn"))
sites %>%
left_join(spring_over_autumn, by='Location_site') %>%
leaflet() %>%
addTiles() %>%
addProviderTiles(providers$CartoDB) %>%
addCircles(lng = ~long,
lat = ~lat,
radius = ~sqrt(DIF_spring)*5,
color = ~qpal(TIME),
popup = ~paste('<strong>',"Location:",'</strong>', Location_site, '<br/>',
'<strong>',"Most frequent season:",'</strong>', TIME)) %>%
addLegend("bottomright", pal = qpal, values = ~TIME,
title = "Busiest season",
opacity = 1)
George Street is by far the most frequent location, followed by Park Street, Martin Place, Oxford Street and King Street.
weekdays %>% group_by(Location) %>% summarise(Amount=sum(TotalCount)) %>%
arrange(-Amount)
## # A tibble: 62 x 2
## Location Amount
## <chr> <int>
## 1 George Street 10189412
## 2 Park Street 1655114
## 3 Martin Place 1270494
## 4 Oxford Street 977448
## 5 King Street 949544
## 6 Druitt Street 800502
## 7 Pitt Street Mall 774870
## 8 Railway Square 736068
## 9 Elizabeth Street 695310
## 10 Broadway 605376
## # ... with 52 more rows
weekdays_year_LOC <- weekdays %>%
group_by(Location, Year) %>%
summarise(Amount=sum(TotalCount)) %>%
left_join(sites, by='Location') %>%
drop_na()
ggplot(weekdays_year_LOC) +
geom_point(aes(x=Location, y=Amount, size=Amount), color='#301934', fill='#301934', alpha = 0.6, show.legend = FALSE) +
labs(x='Location',
y='Amount of pendestrians counted',
title='Pedenstrian behaviour throughout the years',
subtitle = '{closest_state}',
caption = 'Source: https://data.cityofsydney.nsw.gov.au/')+
theme_minimal()+
scale_y_log10()+ #log scale
theme(axis.text.x = element_text(angle=90, hjust=1, size=6))+
transition_states(states = Year)+
shadow_wake(wake_length = 0.1, alpha=.05, colour = 'grey')
Many site are grouped within 1 single location. Time to fix the aggregation issue, grouping by sites instead of locations…
Which locations increased its amount of pedestrians the most?
weekdays_year_LOCID <- weekdays %>%
group_by(Location_site, Year) %>%
summarise(Amount=sum(TotalCount)) %>%
left_join(sites, by='Location_site')
ggplot(weekdays_year_LOCID) +
geom_col(aes(x=Amount, y=Location_site), color='#301934', fill='#301934', alpha = 0.6, show.legend = FALSE) +
labs(x='Amount of pendestrians counted',
y='Site (disaggregated location)',
title='Pedenstrian behaviour throughout the years',
subtitle = '{closest_state}',
caption = 'Source: https://data.cityofsydney.nsw.gov.au/')+
theme_minimal()+
theme(axis.text.y = element_text(angle=0, size=5))+
transition_states(Year, transition_length = 2, state_length = 1)+
ease_aes('sine-in-out')